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Abstract: We study the adaptation dynamics of an initially maladapted 
asexual population with genotypes represented by binary sequences of length 
L. The population evolves in a maximally rugged fitness landscape with 
a large number of local optima. We find that whether the evolutionary 
trajectory is deterministic or stochastic depends on the effective mutational 
distance des up to which the population can spread in genotype space. For 
rfeff = the deterministic quasispecies theory operates while for deff < 1, 
the evolution is completely stochastic. Between these two limiting cases, 
the dynamics are described by a local quasispecies theory below a crossover 
time Tx while above Tx , the population gets trapped at a local fitness peak 
and manages to find a better peak either via stochastic tunneling or double 
mutations. In the stochastic regime rfcfr < 1, wc identify two subregimes 
associated with clonal interference and uphill adaptive walks, respectively. 
We argue that our findings are relevant to the interpretation of evolution 
experiments with microbial populations. 
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The question whether the course of evolution is predetermined and if yes, 
to what extent and under what cond itions this mi^ht be so has recently at- 



tracted attention of many resea rchers (IWahl and KrakauerI . l2000l : iRouziNE et al. 



2OOII : IYedid and BellI . |2002| ). The answer to this question, particularly for 
large populations, is not obvious since the trajectories traced out during 
evolution are shaped by the interplay of the (deterministic) selective forces 
encoded in the fitness landscape, and the stochasticity of the mutational pro- 
cess, which limits the ability of the population to find and maintain favorable 
genotypes. 

We address this question for an asexual population of size N and bi- 
nary genotype sequences of lengt h L evolving on a fitnes s landscape. As 
there is a considerable evidence (IWhitlock et alx . Il995l ) for interactions 
amongst gene loci (or epistasis), it is important to consider the evolution- 
ary process on a landscape t hat includes them. Suc h inte ractions may 
( Wright!. Il932l: IGavriletsI. 12004: Iweinreich et ali boosi ) or may not 



(ILunzer et ali l2005l : IWeinreich et ali 120061) give rise to multiple peaks 



in the fitness landscape (IJain and KrugI . l2006l ) . But at least on a qualita- 
tive l evel, recent experiments on microbial populations (IElena and Lenskj . 
2OO3I ) support the notion th at the fitness landscape u r iderlying the adaptive 



process has multiple peaks (ILenski and TravisanoI. Il994j: IKorona et al. 



I994J : IBURCH and ChaoI . Il999l . I2OOOI : IElena and SanjuanI . 120031 ). Moti 



vated by this, we conside r the dynamics of the evolutionary process on maxi- 

mally rugged landscapes (IKauffman and LevinI . 119871 : IFlyvbjerg and Lautrup 
19921 ) which have high epistasis and a large number of adaptive peaks sepa- 
rated by valleys. 

A detailed theoretical description of the evolution of a population sub- 
ject to the combined effects of selection, mutation and stochastic drift in 
a complex fitness landscape constitutes a formidable problem, and previ- 
ous studies have usually considered two limiting cases based on the size N 
of the population and the mutation probability /i per generation per base 
(or gene locus). When the total number of mutants produced in a gener- 
ation, A^L/i, is small, the population consists of a single genotype at most 
times. Occasionally a mutation occurs in a single individual, which may 
become fixed in the population with a probability depending on the fit- 
ness advantage of the mutant. The population thus performs an adaptive 
walk along a set of genotypes connected by single point mutations, which 
i s biased towards high fitnesses and terminates at a local fitness maximura 
( IGillespieI Il984j : Ikauffman and Levini . 119871 : Imacken and Perelson . 



198a Imacken et all Il99ll : Iflyvbjerg and LautrupI Il992l : IOrrI . I2OO2I). 



Clearly, the trajectory traced out by the population in this case is determined 
stochastically. In the other extreme limit of ^ 00 applicable to enormously 
large populations, each (relevant) genotype is populated by many individuals 
and the stochasticity inherent in the selection of individuals for reproduction 
can be neglected. This is the regime of deterministic mutation-selection dy- 
namics described by the quasispecies model, whi ch was originally introduced 



in the context of prebiotic rnolecu l ar evolution (|EigenI I1971I : IEigen et al. 
19891 : IBaake and GabrielI . I2OO0I : I Jain and KRUGl . l2006h 



Thus, in these two extreme cases either the population has many weighted 
paths available or follows a single predetermined route to the global peak. 
One would like to know: what is the nature of the dynamics for parame- 
ters lying between these two limits? In the following section, we describe the 
model and introduce a parameter des on the basis of which various dynamical 
regimes are distinguished. The effective distance des is basically a measure 
of the extent to which a finite population can spread in the space of genotype 
sequences by mutations. For infinite populations, this distance equals the di- 
ameter L of the entire sequence space, and we discuss this case in the section 
on quasisp ecies dynamics. We sta r t with our earlie r work on quasispecies 



evolution (IKrug and KarlI . l2003l : IJain and KrugI . l2005l ) which provides 



in a suitably defined strong selection limit, a very transparent picture of the 
evolutionary trajectories and the genotypes that are encountered by a pop- 
ulation moving towards the global fitness peak. We show t hat provided the 
mutat ion probability /i is sufficiently small, the analysis of I Jain and Krug 
(I2OO5I ) holds beyond the strong selection limit and the evolutionary trajec- 
tories obtained at different values of /i can be superimposed by a simple 
rescaling of time. The section on finite populations deals with the two sub- 
cases 1 < des < L and d^^i < 1. The basic idea in the first case is that the 
finite population behaves like a quasispecies in an effective sequence space 
up to a certain timescale above which the stochastic evolution takes over. 
We estimate the time at which the crossover from deterministic to stochastic 
evolution occurs. For (igfr < 1, the dynamics are stochastic at all times but 
depending on the product iVL/i, the dynar nics may be characterized by the 
"clonal interference" of several genotypes (IGerrish and Lenskj . Il998l ) or 
it may follow the adaptive walk scenario described above. In each case, we 
describe several individual fitness trajectories in detail both as a function of 
time and as a function of the system parameters. Finally, in the last section 
we summarize our results and discuss the relation of this work with that of 
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others. 



MODELS 

We consider a haploid, asexual population with genotypes drawn from 
the space of binary sequences a = {cti, ul} of length L, where o", = 
or 1. Depending on the context, a genotype can be thought to represent a 
small genome, a single gene or a sequence of L biallelic genetic loci. A fitness 
W{(7) > proportional to the expected number of offspring produced by 
an individual of genotype a is associated with each sequence. Reproduction 
occurs in discrete, non-overlapping generations. The structure of the popu- 
lation is monitored through the frequency X{a, t) of individuals of genotype 
a in generation t. 

To simulate the stochastic evolution, a population of fixed size is prop- 
agated via standard Wright-Fisher sampling, i.e. each individual in the new 
population chooses an ancestor from the old population with a probability 
proportional to the fitness of the ancestor. Subsequently, point mutations 
are introduced with probability /i per locus per generation. In the limit of 
very large populations, this leads to a deterministic time evolution for the 
average frequency X{a, t) = {X{cr, t)), where the angular brackets refer to an 
average o yer all realizatioiis of th e sampling process. The evolution equation 
reads as fIjAiN and KrugI l2006h 



where 

p,,^., =/('^"^')(l-/i)^-'^(-''^') (2) 

is the probability of producing a as a mutant of a' in one generation, and 
(i((T, a') denotes the Hamming distance between the two genotypes (i.e. the 
number of single point mutations in which they differ). Instead of simulating 
large (infinite) population, we numerically iterate the above discrete time 
equation. For future reference we note that the nonlinear evolution ([1]) is 
equivalent to the linear iteration 

Z{a,t + l) = Y, W W{a')Z{a\ t) (3) 

a 

for the unnormalized fre quency Z{o, t), where X(a, t) = Z(a, t)/^^, Zio' , t) 



flJAiN and KrugI . 120061 ). 
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In order to generate a maximally rugged fitness landscape, the fitness val- 
ues W{a) are chosen independently from a common exponential distribution 
P{W) = with unit mean. In the language of Kauffman's rugged land- 
scape models in which the fitness contribution of each of the L loci depends 
randomly on K other loci, our uncorrelated landscape corresp onds to i^' = 
L — 1 and hence the limit of strong epistasis (IK auffmani . Il993l ) . In particu- 
lar, sign epistasis, in the sense that a particular point mut ation may be bene 



ficial or deleterious depending on the genetic background (IWeinreich et al. 
20051 ). is common in these landscapes. We also note that the selection coef- 



ficient for a mutant of genotype cr in a background of genotype a' is given 
by 

and the probability to find a genotype of fitness larger than W is 



Q{W) 



dw P(w) 



-w 



(5) 



w 



We recall some typica l properties of maximally rugged landscapes ( IKauffman and Levin 



19871 : IKauffmani . Il993l ). which follow from elementary order statistics. For 
S exponentially distribu ted random va r iables, the average value of the max- 
imum IS given by In S ( IDavidI . Ii970I : ISornetteL l2000h which yields the 
expected fitness Wmax = L\n2 of the globally fittest among the 2^ geno- 
types. Correspondingly, the typical fitness of a local maximum which is a 
genotype without fitter one-mutant neighbors is Wioc = ln(-L + 1) ^ Wmax- 
Since the probability that a genotype is a local maximum is + 1), there 
are on an average 2^/ {L + 1) local maxima in these landscapes. For such 
a genotype a with fitness Wioc surrounded by typical genotypes of fitness 
W = the selection coefficient s{a, a') ~ InL 3> 1. In this sense, we are 

dealing with a situation of strong selection throughout this paper. 

For the purposes of illustration, we will base much of the discussion below 
on two reference landscapes, each of which is a single realization of landscapes 
with sequence length L = 15 and 6. The starting sequence a^^-* is a randomly 
chosen genotype at which the population finds itself in the beginning of 
the adaptation process. For our reference landscapes, a^^^ is of relatively 
poor fitness with value W{a^^'^) ^ 0.13 for both cases. This has a rank 
28795 among 2^^ = 32768 genotypes and 55 among 2^ = 64 genotypes where 
the global maximum is assigned the rank 0. The global peak is located at 
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Hamming distance 10 and 2 from a^^^ with fitness VFmax = 10.72 and 4.29 
for L = 15 and L = 6, respectively. In the following discussion, instead of 
specifying actual fitness values for each sequence, we will provide their ranks 
as a subscript in the population density Xrank(c", 

In the subsequent sections, we will distinguish the dynamics on the ba- 
sis of a parameter d^s which is a measure of the typical extension of the 
population in genotype space and for strong selection, it is given by 

des ~ -pi — r- (6) 

I ln/i| 

Due to the quasispecies equation ([T]), the average number of individuals pro- 
duced in one generation at a sequence a located at distance d{a, a^^^) from a 
localized population of size is given 

\jj N^d{a,aW)_ Thus the maximum dis- 
tance des at which at least one individual (required for asexual reproduction) 
can be detected after one generation is given by ([6]). However in the next 
generation, the mutants of a^^^ can acquire further mutations thus extending 
the spread of the population beyond des- We argue below that for landscapes 
with large selection coefficients as is the case with our rugged landscapes, the 
above definition is nevertheless a good approximation. 

To see this, let us consider the evolution in a landscape with infinite 
selection coefficients for which ([6]) is exact. As argued above, starting from 
a localised population at a^^\ at t = 1 the population spreads over a typical 
distance des- If the landscape is such that all the sequences except the 
best one amongst the ones available within des are lethal (i.e. with fitness 
zero), then in the next generation the population will move to the lone viable 
genotype (fitter than o"'-"-'). This sequence in turn can be treated as the new 
0"*^°^ and the above argument can be applied recursively. 

That cannot hold for weak selection can be seen by considering the 
flat fltness landscape (with selection coefficients zero) for which it is known 
that the average Hamrning d istance over which the population spreads is 
( Derrida and Pelit] . Il99ll ) 



2 (TT 



^Ha.--(^^J, (7) 



and which for large Nfi is simply L/2. Away from these two limiting cases, 
one may expect an explicit dependence on the relevant selection coefficients. 
For rugged landscapes, one can get an idea of such a dependence at late times 
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when (as explained in the section on finite populations), the population gets 
trapped at a sequence whose mutants within d^s are not fitter than itself. In 
such a case, the population at the peak and its surrounding valley reaches a 
stationary state and forms a quasispecies. Approximating the surrounding 
genotypes by a fiat fitness landscape with W{a') = 1 and the localising se- 
quence with fitness W{a) ^ 1, an analysis within the unidirectional approx- 



imati on shows that the population distribution is an exponential (iHlGGsl . 
1994h 



Defining des as the genetic distance at which the population fraction falls to 
1/N, the resulting expression for des is given by 

\nN f 1 \ 

I m/i| \ s\ myu|/ 

which reduces to with a correction that becomes negligible for s(cr, cr') | lnyu| ^ 
1. When either the selection is weak or the mutation rate is large, the effec- 
tive mutational distance is larger than given by ([6]). In the following sections, 
we will study three distinct cases classified on the basis of distance ([6]): (a) 
des = L (h) 1 <des < L (c) des < 1- 



QUASISPECIES DYNAMICS 

When the population > the effective distance d^s = L and the 
population can spread all over the Hamming space. For small mutation 
probability /x (of the order 10^^ — 10^^) that we consider here, this population 
size far exceeds the number of available genotypes. The requirement of such 
a large population size for a completely deterministic description comes from 
([2]), according to which the mutation probability decreases exponentially with 
the distance. 

The discrete time quasispecies equation ([1]) was iterated numerically for 

the population fraction Xf}^^ \ where we have labeled a sequence by its 
rank and Hamming distance from a^^\ The time evolution is depicted in 
Figured] for various /i and fixed L. Since X{a,l) ~ , all the mutants 

become available immediately with a concentration decreasing exponentially 
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with the distance from the parent sequence. As a result, the population 
at fitter sequences closer to the parent increases and that at 0"^°^ decreases. 
One of these fit sequences becomes dominant in the sense that it supports the 
largest population. This sequence is in turn overtaken by a fitter sequence 
close to it, and this process of leadership changes goes on until the population 
has reached the global maximum. We are interested in the evolutionary 
trajectory traced out by the most populate d se quence a*(t) at tinae t. 

The analysis of lKRUG and KarlI (120031 ) and ljAiN and KrugI (120051 ) pro- 
vides a simple way of identifying the genotype a* for a given landscape and 
a given starting sequence. It is based on a particular strong selection hmit, 
in which the mutation rate is scaled to zero and the fitnesses are scaled 
to infinity in such a way that the (appropriately normalized) logarithmic 
population fractions remain well behaved. The key observation is that the 
behavior of the evolutionary trajectory (J*{t) can be accurately predicted by 
simply assuming that the mutations can be turned off once the sequence 
space has been "seeded" by the population fraction ~ fi'^ that is established 
by mutations after the first generation. Thus, each unnormalised population 
frequency Z[a,t) changes exponentially in time according to its own fitness, 
from an initial value proportional to yu'^*-°^''^ \ In logarithmic v ariables, this 
implies the simple linear time evolution (see also Zhang ( 19971 )) 

\nZ{a,t) = -\\nij\d{a,a'-^^) + \nW{a) t. (10) 

Since the first term on the right hand side is the same for all sequences 
in a shell of constant Hamming distance d{a,a^^^), within each shell only 
the sequence with the largest fitness needs to be considered for determining 
o"*(t). It is also evident from (|TOl) that among these shell fitnesses only 
fitness records, that is, sequences whose fitness is larger than the fitnesses in 
all shells closer to a^^\ can possibly partake in the evolutionary trajectory. 
Fitness records can be identified purely on the basis of the fitness rank. Their 
statistical properties are independent of the underly ing fitness distribution, 
but depend on the geo metry of the sequence space (IJain and KrugI . 12005 : 
Krug and JainI . [2OO5I ). 

The set of sequences {a*} making up the evolutionary trajectory is a 
subset of the fitness records, from which those records are eliminated which 
are being bypassed by a fitter but mor e distant record before r eaching the 



statu s of the most populated genotype (IKrug and KarlI . l2003l : ISire et al. 
20061 ). To decide whether a given record is bypassed, the actual fitness val 



ues and not just their ranks are needed. Bypassing is a significant effect: it 
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Figure 1: Quasispecies evolution of the population X^.^^^ ' . The numerical 
iteration of equation ([T]) is shown for /x = 10~^, 10~^, 10~^ (top to bottom) 
with L = 15, starting from all the population at sequence a'^'^^ in the fitness 
landscape explained in the text. The sequences with fraction > 0.005 are 
shown. 
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Figure 2: Punctuated rise of the average fitness W{t) for fixed landscape and 
fixed initial condition in the quasispecies model with genome length L = 15. 
The solid line is the fitness IVmax of the global maximum and broken one is 
e~^^Wraa.x with fi = 10~^. The steps become more diffuse as /i increases, and 
the fitness level is reduced for the largest value of ^ due to the broadening 
of the genotype distribution. Inset: Average fitness plotted as a function of 
t/\ ln/i| to show the scaling of jump times. 



reduces the number of steps in the evolutionary trajectory from the number 
of records, which is of order L, to the order \/L for logar i thmic fitness dis- 
tribu tions of the exponential type (jjAiN and KrugI . boosi : IKrug and Jaini . 



20051 ) . Thus, not all of the L + 1 mutant classes can appear in the trajec- 
tory and in fact, only a vanishing fraction of a total of 2^ genotypes actually 
appear (Figure [1]). 

When applied to our reference landscape, the above analysis predicts an 
evolutionary trajectory involving the genotypes with ranks 28795, 4688, 5, 1 
and in shells 0,1,2,7 and 10 respectively, which are precisely the ones that 
appear in Figure [H Each of these genotypes is also a record, none of which 
is bypassed in the landscape used here. To see bypassing of the contending 
genotypes, we need to consider larger L as the number of bypassed sequences 
increases with L. 
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As Figure [T] shows, although the set {cr*} remains the same for a broad 
range of mutation probabihty n, the timing of the appearance of new mu- 
tants and the polymorphism of the population depends on fi. These effects 
are also reflected in the stepwise behavior of the population averaged fitness 
yV{t) = '^^W{a)X{a,t) in Figure [2l For smaller fi, adaptive events occur 
at later times. This is expected on the basis of flTU]) from which fi can be 
eliminated by a rescaling of time with | ln/i|. Indeed, the inset shows that 
the timing of the peak shifts can be made to coincide by scaling time with 
|ln/x|. The other effect with incre asing /x is that the transitions be tween 
fitness peaks become more gradual (IKrug and Halpin-HealyI . Il993l ) , and 
the fitness level at a given (rescaled) time is lowered. This happens due to an 
increase in the diversity (the number of genotypes present in the population) 
which is controlled by the probability 1 — (1 — /i)^ ^ fiL for any mutation to 
occur. For the largest mutation probability fi = 10~^ t hat we consider here, 
this probability is significant and the mutational load (IHaldaneI . 119271 ) can 
be estimated as follows. Using the quasisp ecies equation ([!]) in th e steady 



state within unidirectional approximation (IJain and KrugI . 120061 ) for the 
master sequence with fitness W{a*), it immediately follows that the popu- 
lation fitness is given by W{a*)e~'^^ for large L and small /i. The muta- 
tional load is thus W{a*) (1 — e"'^^) and the fitness is reduced by a factor 
g-/xL ~ 0.86 for fi = 0.01 and L = 15, in very good agreement with the data 
in Figure [H To summarise, the mutations affect the dynamics in two re- 
spects: on decreasing /i, the new mutants get fitter but are slower to appear 
("slow-but-fit"). 



FINITE POPULATIONS 



As we discussed above, in the infinite population limit all the genotypes 
are immediately occupied so that the subsequent dynamics involving the fit 
genotypes can be approximated as largely due to the selection process. For 
finite on the other hand, the population distribution has a finite support 
(ieff at any time. Then if the distance to the genotype that offers selective 
advantage over the currently dominant one is larger than des, or the dis- 
tance deff is less than unity, the average number of individuals at the desired 
distance is smaller than one. One cannot work with averages under such 
circumstances and must take fluctuations arising due to rare mutations into 
account. 

Crossover from deterministic to stochastic dynamics: We first 
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Figure 3: Evolutionary trajectories in a sequence space of length L = 6 with 
N = 2^^,fi = 10~^ so that N/j ^ 1.64 and d^s ~ 1.05. The population 
fraction is denoted by Xrank(c") where the cxi's that do not change in the 
course of time are represented by a dash. Only the sequences with population 
fraction > 0.05 are shown. In the initial phase, the three populations X28 
and occur in all of the above trajectories and have rather similar curves 
supporting deterministic evolution. At late times, the population escapes 
the local peak with rank 5 via tunneling in the top panel and by a double 
mutation in the middle one. 
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Figure 4: Evolutionary trajectories for n = 10^^ (left) and n = 10^^ (right) 
with = 2^^ and L = 6. In the left panel, the effective distance des ~ 1.4 and 
the population passes deterministically through the rank 28 sequence towards 
the global maximum. In the right panel, des ~ 2.1 and the population reaches 
the global maximum almost immediately. 

consider the case when 1 < dcs < L. Starting from a parent sequence 0"^°^ 
supporting a population N <C A*~^, the mutants can spread up to a shell at a 
distance d^s < L. Then provided the selection coefficient involving the fittest 
and the next few fittest sequences within d^s is large, the dynamics within 
this distance are similar to the quasispecies case in that the population at the 
fittest sequence in each shell competes with the one in other occupied shells, 
and passing through sequences at which it becomes dominant in the least time 
finds the best available sequence a* within dgs of a^^\ The last step is akin to 
finding the global maximum in the quasispecies case. If however, the selection 
is not strong, several fit genotypes get populated, and due to a mutation in 
this set of fit sequences, the population may be able to find a sequence even 
fitter than the fittest sequence a* within d^s- In such an event, the fittest 
sequence within d^s still achieves a majority status but only momentarily. 
Similar process is repeated within shells at radius d^s from the new most 
populated sequence a*. The above deterministic process is expected to occur 
for individual trajectories obtained in stochastic simulations as long as the 
population can find a sequence better than the current a* within a distance 
(ieflf- In particular, for dgff ~ 1; the local quasispecies evolution continues 
until the population hits a local peak, after which stochastic evolution takes 
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over. The latter typically involves "crossing the valley" via less fit nearest 
neighbor mutants to a better peak than the current one. 

In Figure [31 we chose des slightly above unity; since at any time, typically 
the population can sense only L sequences, we work with a small sequence 
space of length L = 6 to reduce the number 2^ — L of unoccupied sequences. 
Also, we keep the mutation probability fi somewhat large since for (iefr close 
to one, ~ fi~^ and Wright-Fisher sampling requires operations of order 

per time step. Note that in this case the number of genotypes 2^ = 64 
is much smaller than the population size. Nevertheless, we will see that the 
dynamics is far from the deterministic quasispecies limit, because the more 
stringent condition dcs = Lis not met. Since doubling requires increasing 
the population size from A^ to A^^, it is clear that fully deterministic behavior 
can be realized only under extreme conditions. 

Deterministic dynamics: The different runs in Figure [3] correspond to differ- 
ent sampling noise with all the other parameters kept the same. We start with 
all the individuals at sequence a^^^ with rank 55. Since defr is close to one, the 
population spreads from here to sequences within Hamming distance unity of 
a^^^ and moves to the best sequence amongst them namely the sequence with 
rank 28. In this case, there is no bypassing (discussed in the quasispecies 
section) of a fit sequence and the best sequence in the first shell becomes the 
most populated sequence a*. As the population at this sequence grows, the 
chance that it will produce its one- mutant neighbors also increases; in fact, 
a mutant a better than a* appears at time r ~ (1/s) ln(s/A^/i^) where the 
selection coefficient s = s(a, a*), when th e fraction at the current a* becomes 
~ l/Nfx (jWAHL and KrakauerI . I2OO0I ). The population then starts grow- 
ing at the sequence with rank 5 which is the best sequence in the first shell 
centred about the sequence ranked 28. The process so far is deterministic as 
is evident from the three runs. Note that the set a* obtained using the local 
quasispecies theory will in general be different from the quasispecies analysis 
of the Hamming space containing all shells up to the shell in which the local 
peak is situated; this is because the sequences obtained in the former case 
can be outcompeted by fitter mutants before reaching fixation as discussed 
in the last section. For instance, if we apply the deterministic prescription 
to the Hamming space restricted to shell 2 about a^'^\ the sequence ranked 
5 will not appear in the trajectory since it will be immediately overtaken by 
the global maximum which also lies in shell 2. 

In Figure [3l the sequence with rank 5 is a local peak so a better sequence 
lies beyond distance unity; in fact, it lies in the second shell about this local 
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peak and carries the rank label 2. The trajectories in Figure [3] take different 
routes from here onwards. In all the three cases, the last most populated 
sequence shown is at a distance 4 from the global maximum, which in fact 
lies at distance 2 from the initial sequence. Thus, a finite population wanders 
around and is inefficient in search of the global peak. 

Figure m shows the evolutionary trajectories for larger /i (and hence des) 
for fixed population size. In the left panel, since d^s ~ 1.4, the population 
finds the best sequence 28 in shell one about a^^^ as before. But as the 
sequence with globally largest fitness became available due to a mutation in 
a nearest neighbor mutant of a^^\ the population moves to the global peak. 
We performed several runs for this set of parameters and found that X5 never 
achieved a majority status. On increasing /i further corresponding to d^s ~ 
2.1, the sequence with rank being within d^s of the initial sequence became 
immediately available, and the population formed a quasispecies around the 
global peak. 

Stochastic dynamics: We now describe the individual trajectories in Figure [3] 
in some detail. In the top panel, at t = 7, a nearest neighbor of a^^^ with rank 
40 mutated at one locus to produce an individual at rank 4 sequence which 
is a local peak. The rank 4 sequence replaces the rank 5 sequence as the 
most populated genotype before the rank 5 sequence has reached fixation. 
Since the two sequences are 4 point mutations apart, this constitutes an 
example of what has been called a leapfrog episode, in which two consecutive 
majority genotypes appear that are not closely relate d to each other but have 
a com mon ancestor further back in the genealogy (IGerrish and Lenskj . 



19981 ). Later, a rank 50 neighbor of rank 4 sequence mutated once at t = 996 



to populate rank 1 sequence thus enabling the population to shift from one 
peak to another. 

In the middle panel, although a rank 48 neighbor of the sequence ranked 5 
mutated once at t = 1234 to produce an offspring with rank 2, this individual 
was lost. At t = 2384, a double mutation in the sequence ranked 5 allowed 
the population to shift the peaks without crossing the valley. In the last 
panel, the population remained trapped at the rank 5 sequence until the last 
observed time t = 10'*. 

The process of s hifting peaks via yalley crossing (IWright I. I1932I) or 
stochastic tunneling (llWASA et all \20q4 IWeinreich and ChaOi . 2005.) can 
happen if many mutants at Hamming distance unity from a local peak are 
available. While the Wrightian concept of valley crossing involves moving the 
whole population through a low fitness sequence, the process of stochastic 
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tunneling only requires the presence of a few low fitness neighbors and we 
discuss this here. During the residence time of the population near the peak, 
a mutation-selection balance is reached between the peak genotype and its 
one-mutant neighbors. Then the average fraction of population at a given 
valley sequence with fitness W^ut can be estimated using the quasispecies 
equation, and one has 

yVioc — yy mnt 

where W\oc is the fitness of the local peak. Clearly, the total number of 
mutants produced depends on the neighborhood of the local peak; if the 
fitness of the neighbors is much smaller than that of the local peak, then it 
is of the order N^L/W\oc on using that the average value of exponentially 
distributed variables is 1. Else it is dominated by the population at the best 
one-mutant neighbor with fitness close to W\oc- In Figure [3], the sequence 
ranked 4 produced on average NL^ ~ 10 mutants, while rank 5 produced a 
suite of about 200 mutants, a lower bound (~ 80) on which can be obtained 
by using (fTTj) and the fitness PVmut of the rank 6 sequence, which is the fittest 
nearest neighbor of rank 5 sequence. 

Since there are typically many low fitness sequences available in the val- 
ley, it is likely that the population trapped at a local peak escapes due to 
a mutation in one of the N one-mutant neighbors. This gives the simple 
estimate of the tunneling time to be ~ {Nfi'^L)^^ ~ 10'^ for our choice of 
parameters. This in fact is a lower bound as the tunneling time depends 
inversely on the advantage conferred by the next local peak. An expression 
for the rate (~ ^tunnel) t o tunnel to a bene ficial mutation via a deleterious 



one has been obtained in ll WAS A et al\ (120041 ) using a Moran process (also see 



Weinreich and ChaoI (j2005l )). This is given by the product of three fac- 



tors: average number of deleterious mutants produced, mutation probability 
with which a deleterious mutates to an advantageous one and the fixation 
probability which is the relative fitness difference between the final and initial 
mutants finally yielding 



tunnel 



1 



[12) 



where VFioc,{i,f} refers to fitness of the initial and the final local peaks. Insert- 
ing the fitness values of the two local peaks in question, Ttunnei turns out to 
be ~ 3000 which is somewhat larger than that observed in the top panel. 
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In the middle panel, although many mutants are available at the valley 
sequence ranked 6, the population could not tunnel through this sequence 
as it does not have a better neighbor other than sequence 5 itself. Instead 
a double mutation at t = 757 was responsible for escaping the local peak 
at sequence ranked 5 to the next local peak with rank 2. Since the time 
Tdn iihie for the (d e sired) double mutation to occur in o ne generation is given 



by (jlWASA et all 12004 IWeinreich and ChaoI . l2005l ) 



loc 



double 



loc 



- tunnel ; 



(13) 



mut 



it exceeds Ttunnei if W^mut ~ Wioc, and in such a case, tunneling is the dominant 
mode of escaping the local peak. On the other hand, the valleys typically 
encountered in a rugged landscape are "deep" as Wmut = 1 and Wioc = InL. 
In this situation, the population may attempt to hop across the valley; the 
probability for such an event is roughly given by Nfi'^ times the average 
number of fitter neighbors available at distance 2 away. The latter is simply 
(L^/2) Q{W{(r*)). Using Wioc = InL, we again find that the time scale over 
which a double mutation can occur is of the same order as the tunneling 
time. 

Crossover time: We now estimate the time Tx at which the crossover from de- 
terminis tic to stochastic evolution occ urs using an argument em ployed previ- 
ously by IKrug and Kari] ( ]2003l ) and I Jain and Kruq feoosi ). We consider 
the evolution equation (fTOl) for the unnormalised population according to 
which the logarithmic population at a fit sequence increases linearly. Then 
the crossover time Tx at which the first local peak is reached can be approx- 
imated by the typical time at which the population at the first local peak 
(rank 5 in Figure [3]) overtakes the population at the most populated sequence 
a* (rank 28) at Hamming distance unity from it. This is given by 



Tx ~ 



ln/i| 



(14) 



For the landscape used in Figure [21 the fitness W{a*) ~ 0.81 and VFioc ~ 
1.65 so that Tx works out to be about 13 time steps which is in reasonable 
agreement with the time at which appears. The dependence of Tx on L 
can be found by noting that generally the fitness ratio in the argument of 
the logarithm is close to unity, so that the logarithm can be expanded. The 
denominator then reduces to Wioc/ W (a*) — 1 ~ 1/W{a*) on using that the 
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typical difference between tw o expon e ntiall y distributed i ndepe ndent random 
variables is equal to unity (jPAVlpl . Il970l : ISornett^ boool ). The fitness 



W{a*) of the last-but-one most populated sequence in the quasispecies regime 
is expected to be of the same order as the fitness of the local peak which 
increases as InL. Thus for exponentially distributed fitness and des = 1, the 
local quasispecies theory works over a time scale that increases as 

Tx ~ I ln/i| InL. (15) 

Although we mainly discussed the case (iefr = 1 above, it is easy to see that 
for larger effective distance also, the local quasispecies theory will work up to 
a crossover time after which the population will get trapped at a "local peak" 
which does not have a better sequence available within Hamming distance 
des and will have to wait for a rare mutation to find a better sequence. For 
des -C L, the crossover time can be easily generalised by approximating it 
by the time required for the last overtaking event t o happen which is given 
by (IKrug and KarlI . boosi : Ijmn and KrugI boosi ) 



^■<'*'')~ln(»W»'(--))' 

Expanding the logarithm as above, and using that the peak genotype is the 
best amongst ~ L'^"^ sequences, it follows that 

Tx{dcs) ~ c/gfj |ln/i| InL. (17) 

Fully stochastic evolution: We now turn to the regime when the ef- 
fective distance is less than unity. Unlike in the previous cases, now the 
dynamics is stochastic at all times. The parameter d^s < 1 implies that 
the average number of mutants Nfi produced at Hamming distance unity is 
also smaller than 1. Since the population is discrete, this number cannot 
be observed until time ~ {Nfi)~^ when one mutant is produced at a given 
sequence. However, since the mutation probability is rotationally symmetric, 
a total of ~ LNfi new mutants at Hamming distance unity can be produced 
in one generation. The dynamics depend on whether the parameter LNfi is 
above or below unity, and we study these two cases in the following subsec- 
tions. We will mainly focus on the short time regime as the behavior at long 
times is expected to be similar to that discussed previously. 
Clonal interference: Figure [5] shows the temporal evolution of the popula- 
tion fraction for three different sampling noise (keeping rest of the parameters 
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Figure 5: Stochastic trajectories for L — 15, N — 2^°,// = 10~^ with N/i 
0.10 and LN/j, 1.54. The population passes through different routes in 
each case right from the beginning and at short times, several mutants at 
constant Hamming distance are produced simultaneously. Only the mutants 
that achieve a fraction > 0.005 are shown in the plot. In the top panel, all 
the mutants shown belong to the same lineage; in the next two panels, while 
a fit mutant is on its way to fixation, a split in the lineage produced even 
better mutant thus bypassing the former one. 
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same). Clearly the population traces different trajectories in each case. In 
Figure El the population at a^^^ produced a total of NL/j ~ 1 — 2 mutants 
in one generation. Thus in this regime, the sequence space is very sparsely 
populated as only 2 to 3 genotypes are occupied. But since many (about 
LQ(W{a^^^)) ~ 13) of them are better than the parent, the population im- 
mediately begins the hill-climbing process. In the top panel of Figure the 
best one-mutant neighbor of a^^^ with rank 4688 mutated once at t = 6 to 
move the population at a highly fit sequence ranked 159 which is also a local 
peak. In the middle panel, while most of the population climbed the nearest 
neighbor of parent with rank 9195, an individual at a much lower rank 20940 
produced an offspring at 4117 at t = 5. Thus, due to the interference of rank 
20940 sequence, the population managed to access an even fitter sequence. 
After a single mutation at the genotype ranked 4117, the population reached 
a local peak with rank 1524 from where it escaped via double mutation. In 
the last panel, at t = 5, the rank 14622 neighbor of cr^^-* mutated once to 
populate a local peak with rank 2711. However, the population escaped this 
local peak by climbing a better local peak with rank 5 made available due to 
one mutation in sequence 14622 at t = 7. In each case, since the selection co- 
efficients involved are of order unity, the fitter mutants get fixed immediately 
and one can neglect the time to reach fixation. 

In the preceding sections with (iefr ^ 1, all the mutants are available 
within the occupied shells and the best amongst them becomes the most 
populated sequence a*. However, for N/i < 1, only a few randomly sam- 
pled sequences can get populated and as most of the genotypes available 
at Hamming distance one from 0"*^°^ are of comparable fitness, each of them 
can achieve a moderate population frequency. While the best amongst them 
has the highest chance of achieving majority status, the other mutants in 
the meanwhile can establish their own lineage by creating their own (small) 
suite of one-mutant neighbors. If a mutant better than the one that is cur- 
rently going to fixation is produced, there is a competition and the latter is 
bypassed. This process is reminiscent of the bypassing discussed in the qua- 
sispecies section - in both the cases, while a fit mutant is going to fixation, it 
may get bypassed by an even better one. However, while the set of mutants 
that will compete with each other in this manner is predetermined for large 
populations, here they are stochastically generated in time. 

The competition between several beneficia l mutations in an asexual pop - 



ulation has been termed clonal interference (IGerrish and Lenskj . Il998l ) 



A quantitative criterion for the occurrence of clonal interference, adapted to 
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the present situation, reads (IWilkeI . 



2004 ) 



2NLn\nN>l, (18) 

which is clearly satisfied in Figure O However, the usual view of clonal 
interference as an impediment to the simultane ous fixation of different bene- 



ficial m u tation s which slows down adaptation (IGerrish and Lenskj . 11998 ; 



WilkeI . |200J) relies on a situation in which the fitness effects are essen- 
tially additive, and hence strong (sign) epistasis is absent. In rugged fitness 
landscapes, on the other hand, the presence of several competing genotypes 
increases the likelihood of finding high fitness genotypes. This effect is thus 
seen to speed up the adaptive process compared to the regime where benefi- 
cial mutations arise and fix sequentially, which we consider next. 
Adaptive walk: The above discussion of course is contingent on the fact that 
several genotypes are available to explore the landscape. We finally consider 
the case in which the rate LN^ at which the new mutants appear is very 
small. Then the time {LNfi)~^ 3> 1 required to produce a new mutant is very 
large, and the competing mutants are not produced enabling the population 
at the currently occupied genotype to reach a fraction unity. The population 
is thus localised at a single sequence at all times unlike in the previous cases 
where this happened only at long times. In Figure O the dynamics in the 
regime LNfi < 1 are shown for three different values of fi with fixed L and 
A^. The effect of decreasing fi is similar to the quasispecies model in that 
the adaptive events are delayed and the polymorphism is reduced. Since the 
dynamics are now stochastic, the trajectories are different and an averaging 
is required to deduce the effect on fitness. 

At short times, the number of occupied genotypes decreases with de- 
creasing mutation probability. At late times, however, the population can 
be associated with a single sequence for large also due to a reduction in 
Q{W). In the topmost panel of Figure [6], the left hand side of f|T8|) is about 2, 
and correspondingly several genotypes coexist at early times. For LNfi ^ 1, 
as in the bottom panel of Figure O the population shifts as a whole by one 
Hamming distance. Since the mutation probability is small, to a first ap- 
proximation, the population is likely to move only by one step and the hops 
to larger distances can be neglected. Thus, the population keeps moving one 
step uphill on the rugged landscape until it encounters a local peak where- 
upon th is adaptive walk stops. The t ypical length of th i s walk is In L 3 for 



L = 15 (IFlyvbjerg and LautrupI . I1992| : IKauffmanI . I1993| ). For /i = 10 ^ 
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Figure 6: Population evolution when LN/j, <^ 1 ior L — 15, N — 2^° with 
fjL — 10~^, 10~^ and 10~^ (top to bottom). The mutants with Xrank(c) > 
0.005 are shown. 
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in the bottom panel of Figure El the population reaches the sequence with 
rank 2947 which is a local peak. The time to escape this sequence to a 
fitter one in the shell at Hamming distance two is of order {LN ^^)~^ (as 
discussed above) which for our choice of parameters will require about 10^° 
time steps. For small N and /i that we consider here, it may be possible for 
a valley mutant to get fixed before the next local peak does. This requires 
that the time to fix a valley mutant is smaller tha n the time ~ (iV/x)"^ to 



produce its one-mutant ne ighbor with fitness M^iocf (ICarter and WagnerI . 
2002uNqwak et am2004l ). The valley mutant fixation time is exponentially 
large in N if the mutant fitness M^mut -C Wioc,i, while it is of order N for 
the near neutral case. Clearly, the above requirement can be met only when 
the population escapes through a "shallow" valley which is a rather unlikely 
scenario in a rugged landscape. 

Before the population gets trapped at a local peak, the dynani ics can 
be described by the mutational landscape model (IGillespieI . Il984j ) which 
applies to a genetically homogeneous population u ndergoing be neficial muta- 
tion with a very low probability. As pointed out in lORRl ( 120021 ). the behavior 
of the population undergoing an adaptive walk is neither deterministic nor 
completely random in that each (better) mutant would be equally likely to 
get fixed. In fact, each one-mutant neighbor better than the currently occu- 
pied one has a probability to get fixed given by 



^fix (o- lo- 



co) ^ 



E.'n(aV 



(0)^ 



(19) 



where the sum is over the fitter nearest neighbors of a^^^ , and the unnormal- 
ized fixation probability is given by 



n(a|a(°)) 



s cr, a 



l + s(cT,a(o): 



1 - 



(0)^ 



W{a) 



(20) 



for large N (IDurrettI |2002| ) . In the last panel of Figure El the probability 
for the sequence ranked 25483 to get fixed is ~ 0.049 which is almost half 
of the fixation probability ~ 0.095 of the best available sequence with rank 
4688. 

DISCUSSION 

In this article, we posed the question under what conditions biological 
evolution is predictable. To answer this, we studied the dynamics of a finite 
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population within a mutation-selection model defined on the space of 

binary genotype sequ ences of length This work thus considers L loci 

models, unlike that of IRquzine et al\ (120011 ) which focuses on the one locus 
proble m. Our simulations also differ from those of IWahl and Krakauer 
(I2OOOI ) where the dynamics are described by the quasispecies equation ([1]) 
as long as the population fraction X(cr, t) exceeds and if the fraction 
falls below this cutoff, an individual is added to sequence a with a certain 
probability. We have instead simulated the full stochastic process defined by 
Wright-Fisher dynamics which allows us to track the exact evolutionary path 
of any mutant. The fitness landscape under consideration is highly epistatic 
with many local optima. 

We classified the various evolutionary regimes using a parameter defr de- 
fined in ([6]) which has been obtained under the assumption of strong selec- 
tion. Usually the boundary between deterministic and stochastic evolution is 



define d by the produc t iV/L^ (jJoHNSQN et ali Il995l : IWahl and Krakauer 



2OOOI : IRouzine et ali 120011): as most of these theo r ies ar e based on one- 
locus models (IJohnsqn et ali Il995l : IRouzine et ali l200ll ). the description 



in terms of iV/i suffices. We are instead dealing with the whole sequence 
space in which mutations can occur to a distance greater than unity depend- 
ing on the population size N and mutation probability /i. This requires a 
description in terms of the distance d^E which measures the typical distance 
to which the mutants can spread. The boundary N ^ = 1 is included in our 
description as this corresponds to d^^ = 1. However, in contrast to the prod- 
uct Nn, the logarithmic dependence of ([6]) implies that moderate changes in 
des require enormous changes of N or fi. 

Our conclusions summarised in Table [1] fall into three broad categories. 
The infinite population case w i th dpt f = L is described by the determinis- 
tic quasispecies model ( IEigenI . I1971| ). Given the fitness landscape and the 
starting point, one can predict the path taken by the initially unfit pop- 
ulation to a peak in the landscape. For finite populations with des ^ 1, 
although the long time course is determined by stochastically occurring rare 
mutations, it is possible to predict the trajectory until a time Tx (Equation 
[T7|) that increases with L and N using the deterministic prescription locally. 
We emphasize that the dynamics described by the local quasispecies theory 
which applies to shells of size ciefr centred about the current a* is different 
from the quasispecies theory applied to the Hamming space restricted to the 
shells up to the one in which the local peak is located. This is simply be- 
cause the initial population fi'^ at the local peak in question can be smaller 
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than The intuitive picture provi ded by the local quasispecies th eory is 
in fact equivalent to the description in IWahl and KrakauerI (l200d ) where 
quasispecies is applied to full space provided the lower cutoff 1/iV is imposed. 
The viewpoint that quasispecies dynamics can be useful in understanding the 
behavior of finite populatio ns has been expressed by other authors also (see, 
for example I Wilkh (120051 )). 

The local quasispecies description breaks down when the population fails 
to find a genotype better than the currently occupied one within distance 
des- Then rare mutations (of the order fi'^"^'^^) that allow the population 
to access a distance > des play an important role. On rugged landscapes, 
the population can escape this situation either by do uble mutations (for 



Nn ~ 1) or tunneling t h rough the low fitness mutants (jlWASA et all 12004 



Weinreich and ChaoI . l2005l ). Large populations are able to cross a fit- 



ness valley much more rapidly than expected on the basis of the adap- 
tive walk picture, in which the fixation of a deleteriou s mutation is expo- 
nentially unlikely (Ivan Nimvv^egen and CrutchfieldI . bood : IGavrilets . 



2004 : IWeinreich and ChaoI . l2005l ). The reason is that in a large popula- 



tion the less fit genotypes connecting the two fitness peaks are always present 
in some number, enabling the population to climb the new peak without ever 
in its entirety residing in the valley. This is similar to the peak shift mecha- 
nism found in the quasisp ecies model, wher e all possibl e mutants are alway 



20051,120061). 



present in the population (IJAIN and Krug . 

To summarise, there is a crossover in the dynamics when des ^ 1 from 
a deterministic quasispecies type dynamics to stochastic dynamics in which 
stochastic escapes occur. For RNA virus with typical population size N ~ 
10® and mutation probabil ity n ~ ip~^ P^ r base per generation in a genome 
of about thousand bases ( LazaroI . I2OO6I ) . these parameters give des ~ 2 
which suggests that the local quasispecies dynamics operate in the finite 
viral populations for short times. This sc enario is expected to hold goo d for 



HIV also for which the product ~ 1 (IRouzine and CoffinI . Il999l ). 

For Nfi < 1, the dynamics are stochastic right from the start. The 
long time dynamics are expected to be qualitatively similar to that discussed 
above. But the short time dynamics differ considerably and depend on the 
number of one-mutant neighbors. While ma r iy analytical results are avail- 
able for the adaptive walk limit (IGillespieI . 11984 : IKauffmanI . Il993l ). the 
parameter regime when NLfi is not too small on epistatic landscapes requires 
further attention. In experiments on E. Coli which has L ~ 10®, /i ~ 10~^° 
and typical colony sizes of order 10®, <^ 1 but LN ^ ^ 1, which hints at 
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dcS — L 


1 < C?cff < ^ 


dcs < 1 


Behavior 


Deterministic 


Crossover deterministic — *■ stochastic 


Stochastic 


Regime 


Quasispecies 


t < Tx- Local quasispecies 
t > Tx'- Valley crossing or hopping 


LNfi > 1: Clonal interference 
LNfi < 1: Adaptive walk 



Table 1: Summary of regimes in evolution on rugged landscapes where d^s 
IniV/l ln/i|. 



the stochastic nature of the bacterial evolution. This behavior has been seen 
in the experiments by the Lenski group in which the fitness o f bacterial popu- 



lations evolving under identical condi tions diverged in time ( IKorona et al. 



1994 : ILenski and TravisanoI . Il994l ) 



In this article, we have provided a unified picture of the nature of the 
evolutionary process. As our models are defined on sequence space, this con- 
stitutes a step towards realistic modeling of the biological evolution occurring 
in the genotypic space. Inclusion of other relevant factors such as recombi- 
nation could be the next step in our understanding of genetic evolution. 
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